-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Test] Allow allocation in mixed cluster #129680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The RunningSnapshotIT upgrade test adds shutdown marker to all nodes and removed them once all nodes are upgraded. If an index gets created in a mixed cluster, for example by ILM or deprecation messages, the index cannot be allocated because all nodes are shutting down. Since the cluster ready check between node upgrades expects a yellow cluster, the unassigned index prevents the ready check to succeed and eventually timeout. This PR fixes it by removing shutdown marker for the 1st upgrade node to allow it hosting new indices. Resolves: elastic#129644 Resolves: elastic#129645 Resolves: elastic#129646
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
nicktindall
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, so I assume this still doesn't allow the snapshot to complete during the upgrade because there will be 2 shards that can't be assigned, because only 1 node has no shutdown marker?
Those two shards remain on their initial nodes. They are not unassigned because they are not new shards. The snapshot cannot complete because:
So yeah the snapshot can completely only when all nodes are upgraded and remaining shutdown marker removed. |
|
@elasticmachine update branch |
The RunningSnapshotIT upgrade test adds shutdown markers to all nodes and removes them once all nodes are upgraded. If an index gets created in a mixed cluster, for example by ILM or deprecation messages, the index cannot be allocated because all nodes are shutting down. Since the cluster ready check between node upgrades expects a yellow cluster, the unassigned index prevents the ready check to succeed and eventually timeout. This PR fixes it by removing shutdown marker for the 1st upgrade node to allow it hosting new indices. Resolves: elastic#129644 Resolves: elastic#129645 Resolves: elastic#129646
The RunningSnapshotIT upgrade test adds shutdown markers to all nodes and removes them once all nodes are upgraded. If an index gets created in a mixed cluster, for example by ILM or deprecation messages, the index cannot be allocated because all nodes are shutting down. Since the cluster ready check between node upgrades expects a yellow cluster, the unassigned index prevents the ready check to succeed and eventually timeout. This PR fixes it by removing shutdown marker for the 1st upgrade node to allow it hosting new indices. Resolves: elastic#129644 Resolves: elastic#129645 Resolves: elastic#129646
This is the same failure as observed in elastic#129644 for which the original fix elastic#129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards. Resolves: elastic#132135 Resolves: elastic#132136 Resolves: elastic#132137
This is the same failure as observed in #129644 for which the original fix #129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards. Resolves: #132135 Resolves: #132136 Resolves: #132137
…2157) This is the same failure as observed in elastic#129644 for which the original fix elastic#129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards. Resolves: elastic#132135 Resolves: elastic#132136 Resolves: elastic#132137 (cherry picked from commit f39ccb5) # Conflicts: # muted-tests.yml
…132233) This is the same failure as observed in #129644 for which the original fix #129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards. Resolves: #132135 Resolves: #132136 Resolves: #132137 (cherry picked from commit f39ccb5) # Conflicts: # muted-tests.yml
…2157) This is the same failure as observed in elastic#129644 for which the original fix elastic#129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards. Resolves: elastic#132135 Resolves: elastic#132136 Resolves: elastic#132137
…2157) This is the same failure as observed in elastic#129644 for which the original fix elastic#129680 did not really work. It did not work because the the ordering of checks. The shutdown marker is removed after the cluster passes ready check so that new shards can be allocated. But the cluster cannot pass the ready check before the shards are allocated. Hence the circular dependency. In hindsight, there is no need to put shutdown record for all nodes. It is only needed on the node that upgrades the last to prevent snapshot from completion during the upgrade process. This PR does that which ensures there are always 2 nodes for hosting new shards. Resolves: elastic#132135 Resolves: elastic#132136 Resolves: elastic#132137
The RunningSnapshotIT upgrade test adds shutdown markers to all nodes and removes them once all nodes are upgraded. If an index gets created in a mixed cluster, for example by ILM or deprecation messages, the index cannot be allocated because all nodes are shutting down. Since the cluster ready check between node upgrades expects a yellow cluster, the unassigned index prevents the ready check to succeed and eventually timeout. This PR fixes it by removing shutdown marker for the 1st upgrade node to allow it hosting new indices.
Resolves: #129644
Resolves: #129645
Resolves: #129646